Towards Full Automation Of Lexicon Construction
نویسندگان
چکیده
We describe work in progress aimed at developing methods for automatically constructing a lexicon using only statistical data derived from analysis of corpora, a problem we call lexical optimization. Specifically, we use statistical methods alone to obtain information equivalent to syntactic categories, and to discover the semantically meaningful units of text, which may be multi-word units or polysemous terms-incontext. Our guiding principle is to employ a notion of “meaningfulness” that can be quantified information-theoretically, so that plausible variants of a lexicon can be judged relative to each other. We describe a technique of this nature called information theoretic co-clustering and give results of a series of experiments built around it that demonstrate the main ingredients of lexical optimization. We conclude by describing our plans for further improvements, and for applying the same mathematical principles to other problems in natural language processing.
منابع مشابه
On-Line Character Analysis and Recognition With Fuzzy Neural Networks
A new recognition system based on a neuro-fuzzy system, called FasArt, is proposed in this paper. Satisfactory results were obtained using the train_r01_v02 UNIPEN dataset, together with a comparison with the recognition rates achieved by independent human testers. Two methods for segmenting handwritten components into strokes are proposed, with better experimental results for the method based ...
متن کاملITRI-00-37 Semi-automatic construction of multilingual lexicons
The construction of lexicons for NLP applications is a potentially very expensive task, but a crucially important one, especially in multilingual applications. The automation of the task from generic data sources or corpora is as yet largely impractical for most applied systems. In this paper we describe a methodology for the semi-automation of the task, used in the CLIME project to develop bil...
متن کاملLexicon Based Ontology Construction
Researchers from industry and academia are now exploring the possibility of creating a "Semantic Web," in which meaning is made explicit, allowing machines to process and integrate Web resources intelligently. This technology will allow interoperability among development of intelligent internet agents in large scale, facilitating communication between a multitude of heterogeneous web-accessible...
متن کامل3-D Graphical Visualization for Construction Automation
The availability of low-cost, high performance computers that are capable of real-time 3-D graphic simulation has lead to a plethora of applications in the construction industry. This technology is particularly beneficial in the design and simulation of automated construction systems. The expense of physically constructing and implementing a full-scale prototype automated construction systems h...
متن کاملA Proposed “model for Adoption” of High Technology Products (robots) for Indian Construction Industry
Construction industry is considered as labour intensive, having shortage of skilled labour, unsafe with large number of industrial accidents. Construction industry requires high technology automation products (Robots) for improving productivity, safety, quality etc. Robots are developed by various countries in different areas like demolition, earthwork, bridge, tunnels, road work, underwater wo...
متن کامل